850 research outputs found

    Modular Networks: Learning to Decompose Neural Computation

    Get PDF
    Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end. In contrast to existing approaches, training does not rely on regularization to enforce diversity in module use. We apply modular networks both to image recognition and language modeling tasks, where we achieve superior performance compared to several baselines. Introspection reveals that modules specialize in interpretable contexts.Comment: NIPS 201

    Transfer Learning for Speech Recognition on a Budget

    Full text link
    End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network's weights were sufficient for good performance, especially for inner layers.Comment: Accepted for 2nd ACL Workshop on Representation Learning for NL

    The Benefits of Model-Based Generalization in Reinforcement Learning

    Full text link
    Model-Based Reinforcement Learning (RL) is widely believed to have the potential to improve sample efficiency by allowing an agent to synthesize large amounts of imagined experience. Experience Replay (ER) can be considered a simple kind of model, which has proved extremely effective at improving the stability and efficiency of deep RL. In principle, a learned parametric model could improve on ER by generalizing from real experience to augment the dataset with additional plausible experience. However, owing to the many design choices involved in empirically successful algorithms, it can be very hard to establish where the benefits are actually coming from. Here, we provide theoretical and empirical insight into when, and how, we can expect data generated by a learned model to be useful. First, we provide a general theorem motivating how learning a model as an intermediate step can narrow down the set of possible value functions more than learning a value function directly from data using the Bellman equation. Second, we provide an illustrative example showing empirically how a similar effect occurs in a more concrete setting with neural network function approximation. Finally, we provide extensive experiments showing the benefit of model-based learning for online RL in environments with combinatorial complexity, but factored structure that allows a learned model to generalize. In these experiments, we take care to control for other factors in order to isolate, insofar as possible, the benefit of using experience generated by a learned model relative to ER alone

    Présentation du premier numéro de Travail et apprentissages. Revue de didactique professionnelle

    Get PDF
    DĂ©cidĂ©ment, il y a du nouveau dans l’analyse du travail ! Sous la plume de Pierre Roche, on a pu lire, dans cette mĂȘme rubrique du numĂ©ro 99 de Formation Emploi, la prĂ©sentation de l’ouvrage de Dominique Lhuillier « Cliniques du Travail » (Roche, 2007). Aujourd’hui, il s’agit d’une nouvelle revue – « Travail et Apprentissages – Revue de Didactique Professionnelle » – dont le premier numĂ©ro est paru en fĂ©vrier 2008. PubliĂ©e avec le soutien de l’association « Recherches et pratiques en didactiq..

    Causal diffusion and its backwards diffusion problem

    Full text link
    This article starts over the backwards diffusion problem by replacing the \emph{noncausal} diffusion equation, the direct problem, by the \emph{causal} diffusion model developed in \cite{Kow11} for the case of constant diffusion speed. For this purpose we derive an analytic representation of the Green function of causal diffusion in the wave vector-time space for arbitrary (wave vector) dimension NN. We prove that the respective backwards diffusion problem is ill-posed, but not exponentially ill-posed, if the data acquisition time is larger than a characteristic time period τ\tau (2 τ2\,\tau) for space dimension N≄3N\geq 3 (N=2). In contrast to the noncausal case, the inverse problem is well-posed for N=1. Moreover, we perform a theoretical and numerical comparison between causal and noncausal diffusion in the \emph{space-time domain} and the \emph{wave vector-time domain}. The paper is concluded with numerical simulations of the backwards diffusion problem via the Landweber method.Comment: In the replacement I have rewritten the abstract and the introduction. Moreover, I have added Remark 1 and simplified a little bit the proof of Theorem 4. The reference 25 is updated, since the paper is now publishe

    The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

    Full text link
    The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the model's throughput and the chosen compute class. Notably, this approach avoids constraints on critical hyperparameters which affect total parameters or floating-point operations. For evaluation, we pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. On it, we compare methods based on their empirical scaling trends which are estimated through experiments at various levels of compute. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput. While the GPT baseline achieves better perplexity throughout all our levels of compute, our LSTM baseline exhibits a predictable and more favourable scaling law. This is due to the improved throughput and the need for fewer training tokens to achieve the same decrease in test perplexity. Extrapolating the scaling laws leads of both models results in an intersection at roughly 50,000 accelerator hours. We hope this work can serve as the foundation for meaningful and reproducible language modelling research

    A test of positive suggestions about side effects as a way of enhancing the analgesic response to NSAIDs

    Get PDF
    Side effects are frequent in pharmacological pain management, potentially preceding analgesia and limiting drug tolerability. Discussing side effects is part of informed consent, yet can favor nocebo effects. This study aimed to test whether a positive suggestion regarding side effects, which could act as reminders of the medication having been absorbed, might favor analgesia in a clinical interaction model. Sixty-six healthy males participated in a study “to validate pupillometry as an objective measure of analgesia”. Participants were unknowingly randomized double-blind to positive vs control information about side effects embedded in a video regarding the study drugs. Sequences of moderately painful heat stimuli applied before and after treatment with diclofenac and atropine served to evaluate analgesia. Atropine was deceptively presented as a co-analgesic, but used to induce side effects. Adverse events (AE) were collected with the General Assessment of Side Effects (GASE) questionnaire prior to the second induced pain sequence. Debriefing fully informed participants regarding the purpose of the study and showed them the two videos.The combination of medication led to significant analgesia, without a between-group difference. Positive information about side effects increased the attribution of AE to the treatment compared to the control information. The total GASE score was correlated with analgesia, i.e., the more AEs reported, the stronger the analgesia. Interestingly, there was a significant between-groups difference on this correlation: the GASE score and analgesia correlated only in the positive information group. This provides evidence for a selective link between AEs and pain relief in the group who received the suggestion that AEs could be taken as a sign “that help was on the way”. During debriefing, 65% of participants said they would prefer to receive the positive message in a clinical context. Although the present results cannot be translated immediately to clinical pain conditions, they do indicate the importance of testing this type of modulation in a clinical context
    • 

    corecore